[MNT] Diagnose and address long test runtimes (#1633)#1692
[MNT] Diagnose and address long test runtimes (#1633)#1692Abhishek9639 wants to merge 3 commits intoopenml:mainfrom
Conversation
- Add global per-test timeout (600s) to pytest config - CI: report all test durations (--durations=0) for diagnosis - CI: add explicit --timeout=600 to prevent hanging tests - Optimize verify_cache_state fixture: scope function -> module - Add scripts/profile_tests.sh for local duration profiling
geetu040
left a comment
There was a problem hiding this comment.
Thanks, scripts/profile_tests.sh file will be used, but I not sure about other duration and timeout related changes. See comments below.
.github/workflows/test.yml
Outdated
| fi | ||
|
|
||
| pytest -n 4 --durations=20 --dist load -sv $codecov -o log_cli=true -m "$marks" | ||
| pytest -n 4 --durations=0 --timeout=600 --dist load -sv $codecov -o log_cli=true -m "$marks" |
There was a problem hiding this comment.
durations: should not be set to 0, we could possibly use this in CI
timeout: we need to think about this, I would try to avoid setting this explicitly, but let's discuss. what's your reasoning to set a timeout here?
There was a problem hiding this comment.
Makes sense, I've reverted both changes durations is back to 20 and removed timeout from CI.
There was a problem hiding this comment.
this file looks fine and we probably need this
duration, timeout and output file path should take value from users like markers and should have default value
There was a problem hiding this comment.
Update done The script now supports m (marker), d (durations), t (timeout), and o (output file) as CLI arguments, each with sensible default values. Let me know if you'd like any tweaks.
tests/conftest.py
Outdated
|
|
||
|
|
||
| @pytest.fixture(autouse=True, scope="function") | ||
| @pytest.fixture(autouse=True, scope="module") |
There was a problem hiding this comment.
why did you have to change this?
There was a problem hiding this comment.
You're right, this change wasn't necessary. I've reverted it back to scope="function". Thanks for pointing it out
…script - Revert CI workflow to original --durations=20 (no timeout) - Remove global timeout from pyproject.toml - Revert conftest.py verify_cache_state scope to function - Update profile_tests.sh: accept CLI args (-m, -d, -t, -o) with defaults
|
you should mention the issue #1633 without the keyword |
|
@geetu040, |
| pytest \ | ||
| --durations="$DURATIONS" \ | ||
| --timeout="$TIMEOUT" \ | ||
| -q \ | ||
| -m "$MARKER_FILTER" \ | ||
| 2>&1 | tee "$OUTPUT_FILE" |
There was a problem hiding this comment.
I would also prefer additional argument -n for more workers and remove -q to get full pytest output
pytest \
+ --dist=load \
+ -n=$NUM_WORKERS \
--durations="$DURATIONS" \
--timeout="$TIMEOUT" \
- -q \
-m "$MARKER_FILTER" \
2>&1 | tee "$OUTPUT_FILE"This would mimic the exact pytest command in CI
There was a problem hiding this comment.
Thanks for pointing out the changes.
I’ll make the changes.
- Add -n flag for parallel workers (default: 4) - Add --dist=load to distribute tests across workers - Remove -q flag for full pytest output - Mimics exact pytest command used in CI
|
@geetu040, |
Fixes #1633
Changes
Current CI test runs take 1–2+ hours. This PR diagnoses the bottleneck and implements several improvements:
Root Cause
The
production_servertests (74 tests) make live API calls to openml.org, taking ~1h 23m in CI even with 4-worker parallelization.Improvements
Global per-test timeout (
pyproject.toml)timeout = 600(10 min) to[tool.pytest.ini_options]CI workflow improvements (
.github/workflows/test.yml)--durations=20→--durations=0to report ALL test durations for diagnosis--timeout=600to all 3 pytest invocationsFixture optimization (
tests/conftest.py)verify_cache_statefixture scope fromfunction→moduleBenchmark script (
scripts/profile_tests.sh)Test Distribution Analysis
production_servertest_serversklearn-onlyVerification